Open In Colab

Emotion AI¶

Nutshell¶

As of now this project is under development.

In this project I build a program that classifies emotions from images of human faces, as explained on the course Modern Artificial Intelligence, lectured by Dr. Ryan Ahmed, Ph.D. MBA.

The data set I use is from https://www.kaggle.com/c/facial-keypoints-detection/overview and consists of over 20000 facial images that have been labeled with facial expression/emotion and approximately 2000 images with their keypoint annotations.

The program will train two models which will detect

  1. facial keypoints
  2. detect emotions.

Then these models are combined into one model that will provide a holistic prediction of the emotion as the output.

A short recap of artificial neuronal networks¶

Artificial neurons are built in a similar way as human neurons. The artificial neurons take in signals through input channels (dendrites in human neurons) and processes information through transfer functions (cell bodies) and generates an output (which would travel through the axon of a neuronal cell).

No description has been provided for this image
No description has been provided for this image

Fig. 1. Side by side view of artificial and biological neurons. Credit: Top image from Introduction to Psychology (A critical approach) Copyright © 2021 by Rose M. Spielman; Kathryn Dumper; William Jenkins; Arlene Lacombe; Marilyn Lovett; and Marion Perlmutter licensed under a Creative Commons Attribution 4.0 International License. Bottom image Chrislb, CC BY-SA 3.0 , via Wikimedia Commons

For example lets consider an artificial neuron (AN) that takes three inputs: $x_1$, $x_2$, and $x_3$. We can then express the output of the artificial neuron mathematically as $y = \phi(X_1W_1 + X_2W_2 + X_3W_3 + b)$. Here $y$ is the output and the $W$s are the weights assigned to each input signal. $b$ is a bias term added to the weighted sum of inputs. $\phi$ is the activation function.

Some common modern activation functions used in neural networks are for example ReLU, GELU and the logistic activation function. ReLU is short for Rectified linear unit function and is defined as $\phi(x) = max(0,\alpha + x'b)$. ReLU is recommended for the hidden layers, since it outputs a linear response for positive values. This helps maintain larger gradients and makes training deep networks more feasible.

The Gaussian Error Linear Unit (GELU) is a smoother version of the ReLU and is defined as $x\phi(x)$, where the $\phi(x)$ stands for Gaussian cumulative distribution function.

The logistic activation function is also called sigmoid function and is defined as $\phi(x) = \frac{1}{1+e^{-x}}$. It takes a number and sets it between 0 and 1 and thus is very helpful in output layers.

No description has been provided for this image

Training¶

All neural networks need to be trained with labeled data. The available data is generally devided to 80% training and 20% testing data. It is also recommended to further divide the training data into an actual training data set (e.g. 60%) and a validation data set (e.g. 20%).

Training is done by adjusting the weights of the network, by iteratively minimising the cost function using for example the gradient descent optimization algorithm. It works by calculating the gradient of the cost function and then takes a step to the negative direction until it reaches the local or global minimum.

A typical choice for a cost function is the quadratic loss, which is formulated as $f_{loss}(w,b)= \frac{1}{N}\sum^n_{i=1}(\hat y-y)$.

Gradient descent algorithm:

1. Calculate the derivative of the loss function $\frac{\delta f_{loss}}{\delta w}$

2. Pick random values for weights and substitute.

3. Calculate the step size, i.e. how much we will update our weights.

step size = learning rate * gradient $=\alpha*\frac{\delta f_{loss}}{\delta w}$

4. Update the parameters and repeat.

new weight = old weight - step size $w_{new}=w_{old}-\alpha*\frac{\delta f_{loss}}{\delta w}$

Below is an example for searching the minimum of a u-shaped funciton with gradient descent. Usually the situation is mulidimensional but the simplification is solved in a similar way.

No description has been provided for this image

Testing various learning rates helps undestand the importance of choosing the parameters of training.

No description has been provided for this image

As shown above too large learning rate can lead to missig the global minimum and/or the model does not converge as quickly. Equally problematic can be too small learning rates when the model does not learn. To solve the problems rising from too small or too large learning rates there are several approaches to adjust the learning rates dynamically.

Momentum is analogous to the balls tendency to keep rolling down hill. Momentum is used to speed up the learning when the error cost gradient is heading in the same direction for a long time, and slow down when a leveled area is reached. Momentum is controlled by a variable that is analogous to the mass of the ball rolling. A large momentum helps avoiding getting stuck in local minima, but might also push through the minima we wish to find. Thus, the parameter has to be selected carefully.

Learning rates can also be adjusted through decay, which basically reduces the learning rate by a certain amount after a fixed number of epochs. It can help solve above like situations, where too great learning rate makes the learning jump back and forth over a minimum.

Adagrad or Adam are examples of popular adaptive algorithms for optimising the gradient descent.

Network architectures¶

The artificial neurons are connected to each other to form neural networks and a plethora of different network architectures exist. To harness the power of AI, it is necessary to know which architecture serves the intended purpose best. Below are three common architectures and their applications.

Recurrent Neural Networks (RNNs) handle sequential data by maintaining a hidden state that captures information about previous elements in the sequence. Therefore they are great for contexts where the output depends on previous inputs, for example time series and natural language processing.

Generative Adversial Networks (GANs) consist of two neural networks - the Generator and the Discriminator. They sparr each other in a zero-sum game framework, where the genrator creates synthetic data that resembles real data and the discriminator evaluates whether it is rela or not. This dirves the generator to output increasingly realistic data. Obviously, this is the choice for many image generation and editing but also for anomaly detection in industiral and security contexts. GANs can model regular patterns and subsequently detect anomalies by comparing generated outputs with real inputs.

Convolutional Neural Networks (CNN) are designed to process data with a grid-like topology and are most commonly used in image analysis. They utilise convolutional layers to learn spatial hierarchies by applying filters (kernels) that slide (convolve) over the input. They usually involve pooling layers that reduce the spatial dimensions and fully connected layers that map the extracted features to outputs.

No description has been provided for this image

Fig. 2. Convolutional neural network. Credit: Aphex34, CC BY-SA 4.0, via Wikimedia Commons

In the Emotion AI, I will use the Residual network (ResNet), which is a Residual Neural Network. Resnet's architecture includes "skip connection" features which enables training very deep networks wihtout vanishing gradient issues. Vanishing gradient problems occurs when the gradient is back-propagated to earlier layers and the resulting gradient is very small.The skip connection feature works by passing the input of one layer to a layer further down in the network. This is also called identity mapping. The ResNet model that I use has been pretrained with the ImagNet dataset.

No description has been provided for this image

Fig. 3. Identity mapping. Credit: LunarLullaby, CC BY-SA 4.0, via Wikimedia Commons

Part 1. Key facial points detection¶

In this section I program the DL model with convolutional neural network and residual blocks to predict facial keypoints. The data set is from https://www.kaggle.com/c/facial-keypoints-detection/overview.

The dataset consists of input images with 15 facial key points each. The training.csv file has 7049 face images with corresponding keypoint locations. The test.csv file has face images only, and will be used to test the model. The images are strings of numbers in the shape of (2140,). That has to be transformed into the real shape of the images (96, 96). Thus we create a 1-D array of the string and reshape it to 2D array.

The model I build will have the architecture presented below. The Resblock consists of two different type of blocks: Convolution block and identity block. As seen below, both of them have an additioinal short path to add the original input to the output. For the Covolution block this includes few extra steps to shape the input to the same dimensions as the output from the longer path.

Final model architecture Resblock architecture
key_points_df['Image'].shape
key_points_df['Image'][0]
type(key_points_df['Image'][0])

key_points_df['Image'] = key_points_df['Image']. apply(lambda img: np.fromstring(img, dtype = int, sep = ' ').reshape(96,96))
key_points_df['Image'][0].shape
(96, 96)
key_points_df.describe()
left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y left_eye_inner_corner_x left_eye_inner_corner_y left_eye_outer_corner_x left_eye_outer_corner_y right_eye_inner_corner_x right_eye_inner_corner_y ... nose_tip_x nose_tip_y mouth_left_corner_x mouth_left_corner_y mouth_right_corner_x mouth_right_corner_y mouth_center_top_lip_x mouth_center_top_lip_y mouth_center_bottom_lip_x mouth_center_bottom_lip_y
count 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 ... 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000
mean 66.221549 36.842274 29.640269 37.063815 59.272128 37.856014 73.412473 37.640110 36.603107 37.920852 ... 47.952141 57.253926 63.419076 75.887660 32.967365 76.134065 48.081325 72.681125 48.149654 82.630412
std 2.087683 2.294027 2.051575 2.234334 2.005631 2.034500 2.701639 2.684162 1.822784 2.009505 ... 3.276053 4.528635 3.650131 4.438565 3.595103 4.259514 2.723274 5.108675 3.032389 4.813557
min 47.835757 23.832996 18.922611 24.773072 41.779381 27.190098 52.947144 26.250023 24.112624 26.250023 ... 24.472590 41.558400 43.869480 57.023258 9.778137 56.690208 32.260312 56.719043 33.047605 57.232296
25% 65.046300 35.468842 28.472224 35.818377 58.113054 36.607950 71.741978 36.102409 35.495730 36.766783 ... 46.495330 54.466000 61.341291 72.874263 30.879288 73.280038 46.580004 69.271669 46.492000 79.417480
50% 66.129065 36.913319 29.655440 37.048085 59.327154 37.845220 73.240045 37.624207 36.620735 37.920336 ... 47.900511 57.638582 63.199057 75.682465 33.034022 75.941985 47.939031 72.395978 47.980854 82.388899
75% 67.332093 38.286438 30.858673 38.333884 60.521492 39.195431 74.978684 39.308331 37.665280 39.143921 ... 49.260657 60.303524 65.302398 78.774969 35.063575 78.884031 49.290000 75.840286 49.551936 85.697976
max 78.013082 46.132421 42.495172 45.980981 69.023030 47.190316 87.032252 49.653825 47.293746 44.887301 ... 65.279654 75.992731 84.767123 94.673637 50.973348 93.443176 61.804506 93.916338 62.438095 95.808983

8 rows × 30 columns

We perform a sanity check for the data by visualising 64 randomly chosen images along with their key facial points.

No description has been provided for this image

Image augmentation¶

Here we create an additional data set where the images are changed slightly to improve the generalisation of the final AI model. We want more data and more variability in e.g. orientation, lighting conditions, or size of the image. This will reduce the likelihood of overfitting and ensuring that the model learns the meaningful "concepts" of emotion recognition. We create this extra data set by creating a copy of the original data set and tweaking it.

I will create 3 types of augmented images:

  1. horisontal flipping
  2. randomly increasing brightness
  3. vertical flipping
(4280, 31)
No description has been provided for this image
(6420, 31)
No description has been provided for this image
(8560, 31)
No description has been provided for this image

Data normalization and scaling¶

I normalize the image pixel values to range 0 - 1. This generates better results in neural networks.

# Obtain the x and y coordinates to be used as target
img_target = augmented_df[:,:30]
img_target = np.asarray(img_target).astype(np.float32)
img_target.shape
(8560, 30)
# Split the data into train and test data
X_train_kp, X_test_kp, y_train_kp, y_test_kp = train_test_split(img_array, img_target, test_size=0.2, random_state=42)
X_train_kp.shape
(6848, 96, 96, 1)
X_test_kp.shape
(1712, 96, 96, 1)

Building the Residual Neural Network model for key facial points detection¶

Kernels are used to modify the input by sweeping it over the original input as shown in this animation:

2D Convolution Animation

Fig. 4 Performing a convolution on 6x6 input with a 3x3 kernel using stride 1x1. Credit: Michael Plotke, CC BY-SA 3.0, via Wikimedia Commons.

For example, we could perform a 2D convolution for our input with this command:

X = Conv2D(filters=64, kernel_size=(7,7), strides=(2,2), kernel_initializer = glorot_uniform(seed=0))(X_input)

Here we tell the function that we want to

  • use 64 distinct filters (each one is a trainable 7×7 “weight grid”).
  • use stride 2x2, i.e., the filter jumps 2 pixels at a time, effectively “skipping” every other location.
  • intialise the kernels with glorot_uniform method, aka Xavier uniform initialization. This draws samples from a uniform distribution within a specific range, which will be determined from the number of input and output units.

In this section I define the model architecture using Keras. Below is the code to generate Resblocks.

# @title Resblock

def res_block(X, filter, stage):
  """
  Implementation of the Resblock.

  Arguments:
  X -- input tensor
  filters -- tuple/list of integers, the number of filters for each conv layer (f1, f2, f3)
  stage -- integer, used to name the layers
  block -- string, used to name the layers uniquely within a stage

  Returns:
  X -- output of the res block
  """
  ### 1: Convolutional block###
  # Make a copy of the input
  X_shortcut = X

  f1, f2, f3 = filter

  # ----Long (main) path-----
  # Conv2d
  X = Conv2D(f1, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_a', \
             kernel_initializer = glorot_uniform(seed=0))(X)
  # MaxPool2D
  X = MaxPool2D(pool_size=(2,2))(X)
  # BatchNorm,ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_a')(X)
  X = Activation('relu')(X)

  # Conv2D (kernel 3x3)
  X = Conv2D(f2, kernel_size = (3,3), strides = (1,1), padding = 'same', name=str(stage)+'convblock'+'_conv_b', \
            kernel_initializer = glorot_uniform(seed=0))(X)
  # BatchNorm, ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_b')(X)
  X = Activation('relu')(X)

  #Conv2D
  X = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_c', \
             kernel_initializer = glorot_uniform(seed=0))(X)
  #BatchNorm, ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_c')(X)


  # ----Short path----

  # Conv2D
  X_shortcut = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_short', \
                      kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
  # MaxPool2D and Batchnorm
  X_shortcut = MaxPool2D(pool_size=(2,2))(X_shortcut)
  X_shortcut = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_short')(X_shortcut)


  # ----Add Paths together----
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  ### 2: Identity block 1 ###
  # Save the input value (shortcut path)
  X_shortcut = X
  block = 'iden1'
  # First component: Conv2D -> BatchNorm -> ReLU
  X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
  X = Activation('relu')(X)

  # Second component: Conv2D (3x3) -> BatchNorm -> ReLU
  X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
  X = Activation('relu')(X)

  # Third component: Conv2D (1x1) -> BatchNorm
  X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)

  # Add shortcut value to the main path
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  ### 3: Identity block 2 ###
   # Save the input value (shortcut path)
  X_shortcut = X
  block = 'iden2'
  # First component: Conv2D -> BatchNorm -> ReLU
  X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
  X = Activation('relu')(X)

  # Second component: Conv2D (3x3) -> BatchNorm -> ReLU
  X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
  X = Activation('relu')(X)

  # Third component: Conv2D (1x1) -> BatchNorm
  X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)

  # Add shortcut value to the main path
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  return X

Now that the Resblock is defined we can build the final model.

# @title Final Resnet Neural Network model

input_shape = (96,96,1)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# Stage 1
X = Conv2D(filters = 64, kernel_size = (7,7), strides = (2,2), name='conv1', \
           kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)

# Stage 2
X = res_block(X, filter =  [64, 64, 256], stage = 'res1')

# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res2')

# We could also add more resblocks if we want
# X = res_block(X, filter= [256,256,1024], stage= 'res3')

# Average pooling
X = AveragePooling2D((2,2), name = 'avg_pool')(X)

# Flatten
X = Flatten()(X)

# Dense, ReLU, Dropout
X = Dense(4096, activation = 'relu')(X)
X = Dropout(0.2)(X)
X = Dense(2048, activation = 'relu')(X)
X = Dropout(0.1)(X)
X = Dense(30, activation = 'relu')(X)

model_1_facialKeyPoints = Model(inputs = X_input, outputs = X)
Model: "functional_1"
****************************************************************************
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
****************************************************************************
| input_layer_2       | (None, 96, 96, 1) |          0 | -                 |
| (InputLayer)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| zero_padding2d_2    | (None, 102, 102,  |          0 | input_layer_2[0]… |
| (ZeroPadding2D)     | 1)                |            |                   |
+---------------------+-------------------+------------+-------------------+
| conv1 (Conv2D)      | (None, 48, 48,    |      3,200 | zero_padding2d_2… |
|                     | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| bn_conv1            | (None, 48, 48,    |        256 | conv1[0][0]       |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_38       | (None, 48, 48,    |          0 | bn_conv1[0][0]    |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_10    | (None, 23, 23,    |          0 | activation_38[0]… |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 23, 23,    |      4,160 | max_pooling2d_10… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_11    | (None, 11, 11,    |          0 | res1convblock_co… |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_a  | (None, 11, 11,    |        256 | max_pooling2d_11… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_39       | (None, 11, 11,    |          0 | res1convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 11, 11,    |     36,928 | activation_39[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_b  | (None, 11, 11,    |        256 | res1convblock_co… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_40       | (None, 11, 11,    |          0 | res1convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 23, 23,    |     16,640 | max_pooling2d_10… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 11, 11,    |     16,640 | activation_40[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_12    | (None, 11, 11,    |          0 | res1convblock_co… |
| (MaxPooling2D)      | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_c  | (None, 11, 11,    |      1,024 | res1convblock_co… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_s… | (None, 11, 11,    |      1,024 | max_pooling2d_12… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_12 (Add)        | (None, 11, 11,    |          0 | res1convblock_bn… |
|                     | 256)              |            | res1convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_41       | (None, 11, 11,    |          0 | add_12[0][0]      |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_conv_a    | (None, 11, 11,    |     16,448 | activation_41[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_bn_a      | (None, 11, 11,    |        256 | res1iden1_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_42       | (None, 11, 11,    |          0 | res1iden1_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_conv_b    | (None, 11, 11,    |     36,928 | activation_42[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_bn_b      | (None, 11, 11,    |        256 | res1iden1_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_43       | (None, 11, 11,    |          0 | res1iden1_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_conv_c    | (None, 11, 11,    |     16,640 | activation_43[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_bn_c      | (None, 11, 11,    |      1,024 | res1iden1_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_13 (Add)        | (None, 11, 11,    |          0 | res1iden1_bn_c[0… |
|                     | 256)              |            | activation_41[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_44       | (None, 11, 11,    |          0 | add_13[0][0]      |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_conv_a    | (None, 11, 11,    |     16,448 | activation_44[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_bn_a      | (None, 11, 11,    |        256 | res1iden2_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_45       | (None, 11, 11,    |          0 | res1iden2_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_conv_b    | (None, 11, 11,    |     36,928 | activation_45[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_bn_b      | (None, 11, 11,    |        256 | res1iden2_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_46       | (None, 11, 11,    |          0 | res1iden2_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_conv_c    | (None, 11, 11,    |     16,640 | activation_46[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_bn_c      | (None, 11, 11,    |      1,024 | res1iden2_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_14 (Add)        | (None, 11, 11,    |          0 | res1iden2_bn_c[0… |
|                     | 256)              |            | activation_44[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_47       | (None, 11, 11,    |          0 | add_14[0][0]      |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |     32,896 | activation_47[0]… |
| (Conv2D)            | 128)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_13    | (None, 5, 5, 128) |          0 | res2convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_a  | (None, 5, 5, 128) |        512 | max_pooling2d_13… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_48       | (None, 5, 5, 128) |          0 | res2convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 5, 5, 128) |    147,584 | activation_48[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_b  | (None, 5, 5, 128) |        512 | res2convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_49       | (None, 5, 5, 128) |          0 | res2convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |    131,584 | activation_47[0]… |
| (Conv2D)            | 512)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 5, 5, 512) |     66,048 | activation_49[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_14    | (None, 5, 5, 512) |          0 | res2convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_c  | (None, 5, 5, 512) |      2,048 | res2convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_s… | (None, 5, 5, 512) |      2,048 | max_pooling2d_14… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_15 (Add)        | (None, 5, 5, 512) |          0 | res2convblock_bn… |
|                     |                   |            | res2convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_50       | (None, 5, 5, 512) |          0 | add_15[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_a    | (None, 5, 5, 128) |     65,664 | activation_50[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_a      | (None, 5, 5, 128) |        512 | res2iden1_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_51       | (None, 5, 5, 128) |          0 | res2iden1_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_b    | (None, 5, 5, 128) |    147,584 | activation_51[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_b      | (None, 5, 5, 128) |        512 | res2iden1_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_52       | (None, 5, 5, 128) |          0 | res2iden1_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_c    | (None, 5, 5, 512) |     66,048 | activation_52[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_c      | (None, 5, 5, 512) |      2,048 | res2iden1_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_16 (Add)        | (None, 5, 5, 512) |          0 | res2iden1_bn_c[0… |
|                     |                   |            | activation_50[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_53       | (None, 5, 5, 512) |          0 | add_16[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_a    | (None, 5, 5, 128) |     65,664 | activation_53[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_a      | (None, 5, 5, 128) |        512 | res2iden2_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_54       | (None, 5, 5, 128) |          0 | res2iden2_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_b    | (None, 5, 5, 128) |    147,584 | activation_54[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_b      | (None, 5, 5, 128) |        512 | res2iden2_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_55       | (None, 5, 5, 128) |          0 | res2iden2_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_c    | (None, 5, 5, 512) |     66,048 | activation_55[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_c      | (None, 5, 5, 512) |      2,048 | res2iden2_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_17 (Add)        | (None, 5, 5, 512) |          0 | res2iden2_bn_c[0… |
|                     |                   |            | activation_53[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_56       | (None, 5, 5, 512) |          0 | add_17[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| avg_pool            | (None, 2, 2, 512) |          0 | activation_56[0]… |
| (AveragePooling2D)  |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| flatten_2 (Flatten) | (None, 2048)      |          0 | avg_pool[0][0]    |
+---------------------+-------------------+------------+-------------------+
| dense_3 (Dense)     | (None, 4096)      |  8,392,704 | flatten_2[0][0]   |
+---------------------+-------------------+------------+-------------------+
| dropout_2 (Dropout) | (None, 4096)      |          0 | dense_3[0][0]     |
+---------------------+-------------------+------------+-------------------+
| dense_4 (Dense)     | (None, 2048)      |  8,390,656 | dropout_2[0][0]   |
+---------------------+-------------------+------------+-------------------+
| dropout_3 (Dropout) | (None, 2048)      |          0 | dense_4[0][0]     |
+---------------------+-------------------+------------+-------------------+
| dense_5 (Dense)     | (None, 30)        |     61,470 | dropout_3[0][0]   |
└---------------------┴-------------------┴------------┴-------------------┘
 Total params: 18,016,286 (68.73 MB)
 Trainable params: 18,007,710 (68.69 MB)
 Non-trainable params: 8,576 (33.50 KB)


  

Explanations of components¶

The Zeropadding adds a border of zeros (3 pixels wide) around the input image. This will prevent information loss at the edges of convolutions.

Conv2D is the cake of the convolutional layer. It applies the filters to the input image and slides them with a set stride. This way the features are extracted from the image.

The BatchNormalisation layer normalizes the output of the convolution, making training more stable. We can say it is the smooth cream layer on our convolution cake.

The ReLU activation function introduces non-linearity to the model.

MaxpPooling2D reduces the spatial dimensions of the feature maps by taking the maximum value in a window and so downsamples the output. After the Resblock, AveragePooling2D is used similar to MaxPooling, except it calculates the average value within the window. It also reduces the size of the feature maps. Just to give an impression of the impact of pooling, if we removed the MaxPooling 2D layers from Resblocks the final model would have 256 million parameters - instead of 18 million.

Flatten converts the multi-dimensional feature maps into a single, long vector, preparing the data for the fully connected layers.

Dense creates a fully connected layer where each neuron is connected to every neuron in the previous layer. These fully connected layers will process the features exrtacted by the convolutional layers.

Dropout layers are a regularisation technique which drops a set percentage of the neurons during training by setting them to zero. This makes the model less likely to overfit, and decreases the interdependency between the neurons. Therefore we improve the performance of the network and the generalisability of the model.

The final model has a very complex structure, 18 million trainable parameters, which allows it to learn to identify emotions as good or even better than average human. However, too many parameters can lead to problems, such as overfitting and slow or nonconverging training. Optimising this many parameters is not a trivial task.

Compiling and training the model¶

I will use the Adam optimization method for the training. Adam is a computationally efficient stochastic gradient method and it combines the gradient descent with momentum and the RMSP algorithm.

As discussed earlier, the momentum speeds the training by accelerating the gradients by adding a fraction of the previous gradient to the current one. The RMSP or Root Mean Square Propagation is an adaptive learning algorithm that takes the 'exponential moving average' of the gradients. In other words, it adapts the learning rate for each parameter by keeping track of an exponentially decaying average of past squared gradients.

The algortihm will proceed as follows:

1. Calculate the gradient $g_t$

$g_t = \frac{\delta L }{\delta w_t}$

2. Update the Biased first moment estimate $m_t$

$m_t = \beta_1 m_{t-1} + (1-\beta_1)g_t$

This is similar to calculating the momentum as we keep track of the decaying average of past gradients.

3. Update the Biased Second Moment Estimate $v_t$

$v_t = \beta_2 v_{t-1} + (1-\beta_2)g_t^2$

This is similar to RMSP as we keep track of an exponentially decaying average of past squared gradients.

4. Bias correction for $m_t$ and $v_t$

Especially at the beginning of training, $m_t$ and $v_t$ are biased toward zero (because the y are initialised at zero). This is corrected by Adam like this:

$\hat m = \frac{m_t}{1-\beta_1^t}$, $\hat v = \frac{v_t}{1-\beta_2^t}$

5. Parameter update

$w_{t} = w_{t_1} - \alpha_t\frac{\hat m_t}{(v_t+\epsilon)^{1/2}}*g_t$

where,

$g_t$ = gradient of the loss with respect to the parameters at iteration $t$

$\alpha_t$ = learning rate at iteration $t$

$\beta_1, \beta_2$ = decay rates for the moment estimates

$\epsilon$ = small constant to prevent division by zero

The tensorflow tool for Adam optimization accepts several arguments as input:

  • learning_rate: can be a float or a scheduler that optimizes the learning rate

  • beta_1: A value or constant tensor (float) that tells the exponential decay rate for the 1st moment estimates, i.e. the means of the gradients. Default = 0.9.

  • beta_2 = A value or constant tensor (float) that tells the exponential decay rate for the 2nd moment estimates, i.e. the uncentered variance of the squared gradients. Default 0.999.

  • amsgrad = True/False. Wether the AMSGrad variant of the algorithm presented in the paper On the Convergence of Adam and beyond shall be applied. Default = False.

  • weight_decay = If set the weight decay will be set.

Other things to consider when optimising¶

The batch size determines how many training examples are processed before the model's internal parameters are updated. Smaller batch sizes can speed up the training per epoch because the model updates more frequently. However, this can lead to less stable convergence, i.e. the training loss may fluctuate more. A small batch size can be beneficial in case the model is overfitting (the trianing loss is significantly lower than the validation loss).

A larger batch size leads to slower training per epoch and requires moe memory, but can yield more stable updates for the parameters. The model usually converges more smoothly, but might not generalise as well due to "sharp minima".

Another way to tune the parameters of optimization is to use learning rate schedulers. Why? As training progresses, the model gets closer to a good solution. Smaller learning rates allow for finer adjustments to the model's weights, helping it converge to a better minimum without overshooting (see the gradient descent examples in the beginning). I have implemented a learning rate algorithm that reduces the learning rate if the validation loss does not improve in 5 epochs.

After training, the model is saved in a .keras file. The .keras is a zip archive that contains:

  • The architecture
  • The weights
  • The optimizer's status
# @title Compiling and training with 3 epochs
if retrain_model:
  adam = tf.keras.optimizers.Adam(learning_rate = 0.0001, beta_1 = 0.9, \
                                  beta_2 = 0.999, amsgrad = False)
  model_3_facialKeyPoints = Model(inputs = X_input, outputs = X)
  model_3_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
                                  metrics = ['accuracy'])

  #Save the best model with least validation loss here
  checkpoint  = ModelCheckpoint(filepath = "FacialKeyPoints_model_3.keras", \
                                verbose = 1, save_best_only = True)

  history3 = model_3_facialKeyPoints.fit(X_train_kp, y_train_kp, batch_size = 32, \
                    epochs = 3, validation_split = 0.05, callbacks=[checkpoint])
No description has been provided for this image
# @title Compiling and training with batch_size = 32, epochs = 80, and decay on plateu of the learning rate
%%capture

if retrain_model:
  initial_learning_rate=0.001

  # compile model
  adam = tf.keras.optimizers.Adam(learning_rate = initial_learning_rate, beta_1 = 0.9, \
                                  beta_2 = 0.999, amsgrad = False)
  model_1_facialKeyPoints = Model(inputs = X_input, outputs = X)
  model_1_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
                                  metrics = ['accuracy'])
  # Callbacks: reduce lr on plateau
  reduce_lr = ReduceLROnPlateau(
      monitor='val_loss',
      factor=0.65,
      patience=5,
      min_lr=1e-6,
      verbose=1
  )

  # Cakkbacks: save best model
  checkpoint = ModelCheckpoint(
      filepath="Models/FacialKeyPoints_model_1.keras",
      verbose=1,
      save_best_only=True
  )

  # Callbacks: logs epoch results to CSV
  csv_logger = CSVLogger(
      'Models/training_history_model_f.csv',
      append=True,         # keep adding if file exists
      separator=','        # comma-separated
  )
  # fit with CSVLogger included
  history = model_1_facialKeyPoints.fit(
      X_train_kp, y_train_kp,
      batch_size=64,
      epochs=80,
      validation_split=0.05,
      callbacks=[checkpoint, reduce_lr, csv_logger]
  )
No description has been provided for this image

Assessing the trained key facial points detection model performance¶

# load the model architecture f = final
adam = tf.keras.optimizers.Adam(learning_rate = 0.0001, beta_1 = 0.9, \
                                beta_2 = 0.999, amsgrad = False)
model_1_facialKeyPoints = tf.keras.models.load_model("Models/FacialKeyPoints_model_1.keras")
model_1_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
                                metrics = ['accuracy'])
# Evaluate the model
# The model from materials has loss: 8.3705 accuracy: 0.85280377 with the X_test,y_test set.

result = model_1_facialKeyPoints.evaluate(X_test_kp, y_test_kp)
54/54 ━━━━━━━━━━━━━━━━━━━━ 3s 22ms/step - accuracy: 0.7946 - loss: 29.5491

Part 2. Facial Expression detection¶

In this second part of the project, I train the second model which will classify emotions. The data contains images that belong to 5 categories:

  • 0 = Angry
  • 1 = Disgust
  • 2 = Sad
  • 3 = Happy
  • 4 = Surprise

The images in the data set are of size 48px * 48px. Therefore they need to be resized so that we can run the Expression detection model with the Key facial point detection model together.

Below is an example of an original image, results from resizing and final image after interpolation.

No description has been provided for this image

Visualising the images in the dataset with the emotions¶

No description has been provided for this image
expression_df.head()
emotion pixels
0 0 [[69.316925, 73.03865, 79.13719, 84.17186, 85....
1 0 [[151.09435, 150.91393, 150.65791, 148.96367, ...
2 2 [[23.061905, 25.50914, 29.47847, 33.99843, 36....
3 2 [[20.083221, 19.079437, 17.398712, 17.158691, ...
4 3 [[76.26172, 76.54747, 77.001785, 77.7672, 78.4...

Below is the counts of each emotion category. Our data is extremely unbalanced with very few images portraying disgust and many images within category happy.

No description has been provided for this image

Data preparation and image augmentation¶

X shape (24568, 96, 96, 1)
y shape (24568, 5)
X train shape (22111, 96, 96, 1)
y train shape (22111, 5)
X val shape (1228, 96, 96, 1)
y val shape (1228, 5)
X test shape (1229, 96, 96, 1)
y test shape (1229, 5)

Data preprocessing¶

In the data preprocessing I will again normalize the data and perform image augmentation, as was done in the Part 1. of the project.

First, I normalize the data to conatin values between 0 and 1. Then, I use the following image augmentation techniques:

  1. rotating up to 15 degrees
  2. shifting the image horisontally up to 0.1*image width
  3. shifting the image vertically up to 0.1*image height
  4. shearing the image up to 0.1
  5. zooming the image up to 10 %
  6. horisontally flipping the image
  7. vertically flipping the image
  8. Adjusting the brightness

The spaces outside the boundaries are filled by replicting the nearest pixels.

Build and train Deep Learning model for facial expression classification¶

The model I will build has the following architecture:

%3 cluster_final_model Emotion Detection model input INPUT zeropad Zero padding input->zeropad conv2d Conv2D zeropad->conv2d bn_relu BatchNorm, ReLU conv2d->bn_relu pool MaxPool2D bn_relu->pool Res1 Res-block pool->Res1 Res2 Res-block Res1->Res2 Avgpool AveragePooling2D Res2->Avgpool flatten Flatten() Avgpool->flatten dense1 Dense, ReLU, Dropout flatten->dense1 output OUTPUT dense1->output
# @title Emotion recognition model

input_shape = (96,96,1)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# Stage 1
X = Conv2D(64, (7,7), strides = (2,2), name = 'conv1', kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)

# Stage 2
X = res_block(X, filter = [64,64,256], stage = 'res2')

# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res3')

# Stage 4 (optional)
#X = res_block(X, filter= [256,256,1024], stage = 'res4')

# Average pooling
X = AveragePooling2D((4,4), name = 'avg_pool')(X)

# Final layer
X = Flatten()(X)
X  = Dense(5, activation = 'softmax', name = 'dense', kernel_initializer=glorot_uniform(seed=0))(X)

Emotion_det_model_2 = Model(inputs = X_input, outputs = X, name = 'Resnet18')
Model: "Resnet18"
****************************************************************************
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
****************************************************************************
| input_layer_3       | (None, 96, 96, 1) |          0 | -                 |
| (InputLayer)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| zero_padding2d_3    | (None, 102, 102,  |          0 | input_layer_3[0]… |
| (ZeroPadding2D)     | 1)                |            |                   |
+---------------------+-------------------+------------+-------------------+
| conv1 (Conv2D)      | (None, 48, 48,    |      3,200 | zero_padding2d_3… |
|                     | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| bn1                 | (None, 48, 48,    |        256 | conv1[0][0]       |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_57       | (None, 48, 48,    |          0 | bn1[0][0]         |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_15    | (None, 23, 23,    |          0 | activation_57[0]… |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 23, 23,    |      4,160 | max_pooling2d_15… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_16    | (None, 11, 11,    |          0 | res2convblock_co… |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_a  | (None, 11, 11,    |        256 | max_pooling2d_16… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_58       | (None, 11, 11,    |          0 | res2convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |     36,928 | activation_58[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_b  | (None, 11, 11,    |        256 | res2convblock_co… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_59       | (None, 11, 11,    |          0 | res2convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 23, 23,    |     16,640 | max_pooling2d_15… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |     16,640 | activation_59[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_17    | (None, 11, 11,    |          0 | res2convblock_co… |
| (MaxPooling2D)      | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_c  | (None, 11, 11,    |      1,024 | res2convblock_co… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_s… | (None, 11, 11,    |      1,024 | max_pooling2d_17… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_18 (Add)        | (None, 11, 11,    |          0 | res2convblock_bn… |
|                     | 256)              |            | res2convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_60       | (None, 11, 11,    |          0 | add_18[0][0]      |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_a    | (None, 11, 11,    |     16,448 | activation_60[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_a      | (None, 11, 11,    |        256 | res2iden1_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_61       | (None, 11, 11,    |          0 | res2iden1_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_b    | (None, 11, 11,    |     36,928 | activation_61[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_b      | (None, 11, 11,    |        256 | res2iden1_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_62       | (None, 11, 11,    |          0 | res2iden1_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_c    | (None, 11, 11,    |     16,640 | activation_62[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_c      | (None, 11, 11,    |      1,024 | res2iden1_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_19 (Add)        | (None, 11, 11,    |          0 | res2iden1_bn_c[0… |
|                     | 256)              |            | activation_60[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_63       | (None, 11, 11,    |          0 | add_19[0][0]      |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_a    | (None, 11, 11,    |     16,448 | activation_63[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_a      | (None, 11, 11,    |        256 | res2iden2_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_64       | (None, 11, 11,    |          0 | res2iden2_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_b    | (None, 11, 11,    |     36,928 | activation_64[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_b      | (None, 11, 11,    |        256 | res2iden2_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_65       | (None, 11, 11,    |          0 | res2iden2_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_c    | (None, 11, 11,    |     16,640 | activation_65[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_c      | (None, 11, 11,    |      1,024 | res2iden2_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_20 (Add)        | (None, 11, 11,    |          0 | res2iden2_bn_c[0… |
|                     | 256)              |            | activation_63[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_66       | (None, 11, 11,    |          0 | add_20[0][0]      |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 11, 11,    |     32,896 | activation_66[0]… |
| (Conv2D)            | 128)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_18    | (None, 5, 5, 128) |          0 | res3convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_a  | (None, 5, 5, 128) |        512 | max_pooling2d_18… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_67       | (None, 5, 5, 128) |          0 | res3convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 5, 5, 128) |    147,584 | activation_67[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_b  | (None, 5, 5, 128) |        512 | res3convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_68       | (None, 5, 5, 128) |          0 | res3convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 11, 11,    |    131,584 | activation_66[0]… |
| (Conv2D)            | 512)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 5, 5, 512) |     66,048 | activation_68[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_19    | (None, 5, 5, 512) |          0 | res3convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_c  | (None, 5, 5, 512) |      2,048 | res3convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_s… | (None, 5, 5, 512) |      2,048 | max_pooling2d_19… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_21 (Add)        | (None, 5, 5, 512) |          0 | res3convblock_bn… |
|                     |                   |            | res3convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_69       | (None, 5, 5, 512) |          0 | add_21[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_conv_a    | (None, 5, 5, 128) |     65,664 | activation_69[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_bn_a      | (None, 5, 5, 128) |        512 | res3iden1_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_70       | (None, 5, 5, 128) |          0 | res3iden1_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_conv_b    | (None, 5, 5, 128) |    147,584 | activation_70[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_bn_b      | (None, 5, 5, 128) |        512 | res3iden1_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_71       | (None, 5, 5, 128) |          0 | res3iden1_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_conv_c    | (None, 5, 5, 512) |     66,048 | activation_71[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_bn_c      | (None, 5, 5, 512) |      2,048 | res3iden1_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_22 (Add)        | (None, 5, 5, 512) |          0 | res3iden1_bn_c[0… |
|                     |                   |            | activation_69[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_72       | (None, 5, 5, 512) |          0 | add_22[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_conv_a    | (None, 5, 5, 128) |     65,664 | activation_72[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_bn_a      | (None, 5, 5, 128) |        512 | res3iden2_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_73       | (None, 5, 5, 128) |          0 | res3iden2_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_conv_b    | (None, 5, 5, 128) |    147,584 | activation_73[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_bn_b      | (None, 5, 5, 128) |        512 | res3iden2_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_74       | (None, 5, 5, 128) |          0 | res3iden2_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_conv_c    | (None, 5, 5, 512) |     66,048 | activation_74[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_bn_c      | (None, 5, 5, 512) |      2,048 | res3iden2_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_23 (Add)        | (None, 5, 5, 512) |          0 | res3iden2_bn_c[0… |
|                     |                   |            | activation_72[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_75       | (None, 5, 5, 512) |          0 | add_23[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| avg_pool            | (None, 1, 1, 512) |          0 | activation_75[0]… |
| (AveragePooling2D)  |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| flatten_3 (Flatten) | (None, 512)       |          0 | avg_pool[0][0]    |
+---------------------+-------------------+------------+-------------------+
| dense (Dense)       | (None, 5)         |      2,565 | flatten_3[0][0]   |
└---------------------┴-------------------┴------------┴-------------------┘
 Total params: 1,174,021 (4.48 MB)
 Trainable params: 1,165,445 (4.45 MB)
 Non-trainable params: 8,576 (33.50 KB)


  
print(f"Training samples: {len(X_train_ed)}")
print(f"Batch size: {64}")
steps_per_epoch=np.ceil(len(X_train_ed) / 64).astype(int)
print(f"Steps per epoch: {steps_per_epoch}")
Training samples: 22111
Batch size: 64
Steps per epoch: 346
<matplotlib.legend.Legend at 0x7a3d9ef22e90>
No description has been provided for this image

Evaluate model¶

Confusion matrix, accuracy, precision, and recall

39/39 ━━━━━━━━━━━━━━━━━━━━ 4s 37ms/step - accuracy: 0.2052 - loss: 2.0693
No description has been provided for this image
39/39 ━━━━━━━━━━━━━━━━━━━━ 4s 44ms/step
Text(70.72222222222221, 0.5, 'True')
No description has been provided for this image
No description has been provided for this image
print(classification_report(true_classes, predicted_classes))
              precision    recall  f1-score   support

           0       0.20      1.00      0.33       245
           1       0.00      0.00      0.00        22
           2       0.00      0.00      0.00       319
           3       0.00      0.00      0.00       458
           4       0.00      0.00      0.00       185

    accuracy                           0.20      1229
   macro avg       0.04      0.20      0.07      1229
weighted avg       0.04      0.20      0.07      1229

/usr/local/lib/python3.11/dist-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
/usr/local/lib/python3.11/dist-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
/usr/local/lib/python3.11/dist-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

The above table tells us that the classes where we had the least data (# support) have the weakest performance. Precision (percentage of samples predicted to be class x that are actually x) and recall (percentage of x samples in data that are correctly labeled as x) are highest in class 3 where we also had the most samples. f1 -score is the harmonic mean of precision and recall and it is calculated as

$F_1 = \frac{\text{precision} \ \times \ \text{recall}}{\text{precision} \ +\ \text{recall}}$

Part 3. Combining the key point detection and facial expression recognition models¶

#HEAD

def predict(X_test):
# Predicting the keypoints
  df_predict = model_1_facialKeyPoints.predict(X_test)

# Predicting the emotion
  df_emotion = np.argmax(model_emotion.predict(X_test), axis = -1)

# Rehaping array from (856,) to (856,1)
  df_emotion = np.expand_dims(df_emotion, axis = 1)

# Converting the predictions into a dataframe
  df_predict = pd.DataFrame(df_predict, columns=columns)

# Adding emotion into the predicted dataframe
  df_predict['emotion'] = df_emotion

  return df_predict
df_predict = predict(X_test_ed)
39/39 ━━━━━━━━━━━━━━━━━━━━ 3s 46ms/step
39/39 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
df_predict.head()
left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y left_eye_inner_corner_x left_eye_inner_corner_y left_eye_outer_corner_x left_eye_outer_corner_y right_eye_inner_corner_x right_eye_inner_corner_y ... nose_tip_y mouth_left_corner_x mouth_left_corner_y mouth_right_corner_x mouth_right_corner_y mouth_center_top_lip_x mouth_center_top_lip_y mouth_center_bottom_lip_x mouth_center_bottom_lip_y emotion
0 39.582371 40.585831 48.418911 38.578739 40.658134 40.747437 37.871552 41.310287 46.310127 39.798634 ... 43.466816 41.438889 48.560757 50.611870 46.908916 43.847321 49.088165 44.476219 48.404957 0
1 32.487503 33.801414 58.870758 36.044651 37.678078 35.521343 27.091919 33.875061 53.634003 36.909916 ... 58.932339 28.900879 72.052238 53.224747 74.112793 40.882328 73.872482 40.393787 78.887833 0
2 61.829601 37.771996 34.182137 35.791634 55.839569 38.763092 67.759720 38.971661 39.653202 37.388924 ... 62.105545 55.127003 78.965759 33.246048 77.127159 43.460148 77.713341 43.174774 79.463249 0
3 44.510548 31.663456 43.904709 30.674473 43.713665 33.162598 45.013626 32.394203 44.206051 32.559414 ... 53.969776 48.071609 66.702530 46.926041 65.904129 45.764507 67.768799 46.450779 70.521561 0
4 31.015606 38.402122 63.297882 41.537052 37.304489 39.922863 24.394827 38.475166 56.901787 42.025349 ... 59.471165 28.988894 74.141426 58.611542 77.316299 43.932095 74.681374 43.413620 80.536591 0

5 rows × 31 columns

Plotting test images of the combined models.

No description has been provided for this image